首页> 外文OA文献 >Online Unsupervised Multi-view Feature Selection
【2h】

Online Unsupervised Multi-view Feature Selection

机译:在线无监督多视图特征选择

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the era of big data, it is becoming common to have data with multiplemodalities or coming from multiple sources, known as "multi-view data".Multi-view data are usually unlabeled and come from high-dimensional spaces(such as language vocabularies), unsupervised multi-view feature selection iscrucial to many applications. However, it is nontrivial due to the followingchallenges. First, there are too many instances or the feature dimensionalityis too large. Thus, the data may not fit in memory. How to select usefulfeatures with limited memory space? Second, how to select features fromstreaming data and handles the concept drift? Third, how to leverage theconsistent and complementary information from different views to improve thefeature selection in the situation when the data are too big or come in asstreams? To the best of our knowledge, none of the previous works can solve allthe challenges simultaneously. In this paper, we propose an Online unsupervisedMulti-View Feature Selection, OMVFS, which deals with large-scale/streamingmulti-view data in an online fashion. OMVFS embeds unsupervised featureselection into a clustering algorithm via NMF with sparse learning. It furtherincorporates the graph regularization to preserve the local structureinformation and help select discriminative features. Instead of storing all thehistorical data, OMVFS processes the multi-view data chunk by chunk andaggregates all the necessary information into several small matrices. By usingthe buffering technique, the proposed OMVFS can reduce the computational andstorage cost while taking advantage of the structure information. Furthermore,OMVFS can capture the concept drifts in the data streams. Extensive experimentson four real-world datasets show the effectiveness and efficiency of theproposed OMVFS method. More importantly, OMVFS is about 100 times faster thanthe off-line methods.
机译:在大数据时代,具有多种模式或来自多种来源的数据变得越来越普遍,称为“多视图数据”。多视图数据通常是无标签的并且来自高维空间(例如语言词汇) ),无监督的多视图功能选择对于许多应用程序至关重要。然而,由于以下挑战,这是不平凡的。首先,实例太多或特征维数太大。因此,数据可能无法容纳在内存中。如何选择内存空间有限的有用功能?其次,如何从流数据中选择特征并处理概念漂移?第三,在数据过大或涌入的情况下,如何利用不同观点的一致性和互补性信息来改进特征选择?据我们所知,以前的作品都无法同时解决所有挑战。在本文中,我们提出了一种在线无监督多视图特征选择OMVFS,它以在线方式处理大规模/流式多视图数据。 OMVFS通过具有稀疏学习的NMF将无监督的特征选择嵌入到聚类算法中。它进一步合并了图正则化,以保留局部结构信息并帮助选择区分特征。代替存储所有历史数据,OMVFS逐块处理多视图数据,并将所有必要的信息聚合到几个小的矩阵中。通过使用缓冲技术,提出的OMVFS可以在利用结构信息的同时降低计算和存储成本。此外,OMVFS可以捕获数据流中的概念漂移。在四个真实世界的数据集上进行的大量实验证明了所提出的OMVFS方法的有效性和效率。更重要的是,OMVFS比离线方法快约100倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号